GH-49103: [Python] Add internal type system stubs (_types, error, _stubs_typing)#48622
Draft
rok wants to merge 37 commits intoapache:mainfrom
Draft
GH-49103: [Python] Add internal type system stubs (_types, error, _stubs_typing)#48622rok wants to merge 37 commits intoapache:mainfrom
rok wants to merge 37 commits intoapache:mainfrom
Conversation
Comment on lines
+49
to
+60
| Mask: TypeAlias = ( | ||
| Sequence[bool | None] | ||
| | NDArray[np.bool_] | ||
| | BooleanArray | ||
| | ChunkedArray[Any] | ||
| ) | ||
| Indices: TypeAlias = ( | ||
| Sequence[int | None] | ||
| | NDArray[np.integer[Any]] | ||
| | IntegerArray | ||
| | ChunkedArray[Any] | ||
| ) |
There was a problem hiding this comment.
This isn't the most exciting suggestion, but it's something that constantly frustrates me 😅
Suggested change
| Mask: TypeAlias = ( | |
| Sequence[bool | None] | |
| | NDArray[np.bool_] | |
| | BooleanArray | |
| | ChunkedArray[Any] | |
| ) | |
| Indices: TypeAlias = ( | |
| Sequence[int | None] | |
| | NDArray[np.integer[Any]] | |
| | IntegerArray | |
| | ChunkedArray[Any] | |
| ) | |
| from pyarrow import lib | |
| IntegerType: TypeAlias = ( | |
| lib.Int8Type | |
| | lib.Int16Type | |
| | lib.Int32Type | |
| | lib.Int64Type | |
| | lib.UInt8Type | |
| | lib.UInt16Type | |
| | lib.UInt32Type | |
| | lib.UInt64Type | |
| ) | |
| Mask: TypeAlias = ( | |
| Sequence[bool | None] | |
| | NDArray[np.bool_] | |
| | lib.Array[lib.Scalar[lib.BoolType]] | |
| | ChunkedArray[Any] | |
| ) | |
| Indices: TypeAlias = ( | |
| Sequence[int | None] | |
| | NDArray[np.integer[Any]] | |
| | lib.Array[lib.Scalar[IntegerType]] | |
| | ChunkedArray[Any] | |
| ) |
An alternative would just be Array[Any].
Using the concrete subclasses requires the stubs to do a carefully choreographed dance, or the user to typing.cast everywhere - sadly
d3c5740 to
27d1c65
Compare
Member
Author
|
I've rebased this on the annotation infra check PR (#48618) to make sure we're on the right track. |
3f9ed3b to
0ac95b0
Compare
|
|
…d script for including docstrings into stubfiles before building wheels. diff --git c/.github/workflows/python.yml i/.github/workflows/python.yml index e5d3679..4ca0f9b 100644 --- c/.github/workflows/python.yml +++ i/.github/workflows/python.yml @@ -239,6 +239,11 @@ jobs: - name: Test shell: bash run: ci/scripts/python_test.sh $(pwd) $(pwd)/build + - name: Test annotations + shell: bash + env: + PYARROW_TEST_ANNOTATIONS: "ON" + run: ci/scripts/python_test_type_annotations.sh $(pwd)/python windows: name: AMD64 Windows 2022 Python 3.13 @@ -296,3 +301,7 @@ jobs: shell: cmd run: | call "ci\scripts\python_test.bat" %cd% + - name: Test annotations + shell: cmd + run: | + call "ci\scripts\python_test_type_annotations.bat" %cd%\python diff --git c/ci/scripts/python_test_type_annotations.bat i/ci/scripts/python_test_type_annotations.bat new file mode 100644 index 0000000000..3446e32 --- /dev/null +++ i/ci/scripts/python_test_type_annotations.bat @@ -0,0 +1,38 @@ +@Rem Licensed to the Apache Software Foundation (ASF) under one +@Rem or more contributor license agreements. See the NOTICE file +@Rem distributed with this work for additional information +@Rem regarding copyright ownership. The ASF licenses this file +@Rem to you under the Apache License, Version 2.0 (the +@Rem "License"); you may not use this file except in compliance +@Rem with the License. You may obtain a copy of the License at +@Rem +@Rem http://www.apache.org/licenses/LICENSE-2.0 +@Rem +@Rem Unless required by applicable law or agreed to in writing, +@Rem software distributed under the License is distributed on an +@Rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +@Rem KIND, either express or implied. See the License for the +@Rem specific language governing permissions and limitations +@Rem under the License. + +@echo on + +set PYARROW_DIR=%1 + +echo Annotation testing on Windows ... + +@Rem Install library stubs +%PYTHON_CMD% -m pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil || exit /B 1 + +@Rem Install other dependencies for type checking +%PYTHON_CMD% -m pip install fsspec || exit /B 1 + +@Rem Install type checkers +%PYTHON_CMD% -m pip install mypy pyright ty || exit /B 1 + +@Rem Run type checkers +pushd %PYARROW_DIR% + +mypy +pyright +ty check diff --git c/ci/scripts/python_test_type_annotations.sh i/ci/scripts/python_test_type_annotations.sh new file mode 100755 index 0000000000..82610ce --- /dev/null +++ i/ci/scripts/python_test_type_annotations.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -ex +pyarrow_dir=${1} + +if [ "${PYARROW_TEST_ANNOTATIONS}" == "ON" ]; then + # Install library stubs + pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil + + # Install type checkers + pip install mypy pyright ty + + # Install other dependencies for type checking + pip install fsspec + + # Run type checkers + pushd ${pyarrow_dir} + mypy + pyright + ty check; +else + echo "Skipping type annotation tests"; +fi diff --git c/ci/scripts/python_wheel_macos_build.sh i/ci/scripts/python_wheel_macos_build.sh index bd61154..b64eee6 100755 --- c/ci/scripts/python_wheel_macos_build.sh +++ i/ci/scripts/python_wheel_macos_build.sh @@ -177,6 +177,11 @@ export CMAKE_PREFIX_PATH=${build_dir}/install export SETUPTOOLS_SCM_PRETEND_VERSION=${PYARROW_VERSION} pushd ${source_dir}/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel popd diff --git c/ci/scripts/python_wheel_validate_contents.py i/ci/scripts/python_wheel_validate_contents.py index 84fcaba..ee4a31a 100644 --- c/ci/scripts/python_wheel_validate_contents.py +++ i/ci/scripts/python_wheel_validate_contents.py @@ -35,6 +35,11 @@ def validate_wheel(path): assert not outliers, f"Unexpected contents in wheel: {sorted(outliers)}" print(f"The wheel: {wheels[0]} seems valid.") + candidates = [info for info in f.filelist if info.filename.endswith('compute.pyi')] + assert candidates, "compute.pyi not found in wheel" + content = f.read(candidates[0]).decode('utf-8', errors='replace') + assert '"""' in content, "compute.pyi missing docstrings (no triple quotes found)" + def main(): parser = argparse.ArgumentParser() diff --git c/ci/scripts/python_wheel_windows_build.bat i/ci/scripts/python_wheel_windows_build.bat index b4b7fed..3da7f60 100644 --- c/ci/scripts/python_wheel_windows_build.bat +++ i/ci/scripts/python_wheel_windows_build.bat @@ -135,6 +135,11 @@ pushd C:\arrow\python @Rem Build wheel %PYTHON_CMD% setup.py bdist_wheel || exit /B 1 +@Rem We first populate stub docstrings and then build the wheel +%PYTHON_CMD% setup.py build_ext --inplace +%PYTHON_CMD% -m pip install griffe libcst +%PYTHON_CMD% ..\dev\update_stub_docstrings.py pyarrow-stubs + @Rem Repair the wheel with delvewheel @Rem @Rem Since we bundled the Arrow C++ libraries ourselves, we only need to diff --git c/ci/scripts/python_wheel_xlinux_build.sh i/ci/scripts/python_wheel_xlinux_build.sh index a3fbeb3..977ef64 100755 --- c/ci/scripts/python_wheel_xlinux_build.sh +++ i/ci/scripts/python_wheel_xlinux_build.sh @@ -167,6 +167,11 @@ export ARROW_HOME=/tmp/arrow-dist export CMAKE_PREFIX_PATH=/tmp/arrow-dist pushd /arrow/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel echo "=== Strip symbols from wheel ===" diff --git c/compose.yaml i/compose.yaml index 2bd38a3..ae0a1d4 100644 --- c/compose.yaml +++ i/compose.yaml @@ -919,12 +919,14 @@ services: environment: <<: [*common, *ccache, *sccache] PYTEST_ARGS: # inherit + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: &python-conda-command [" /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-emscripten: # Usage: @@ -1001,6 +1003,7 @@ services: ARROW_S3: "OFF" ARROW_SUBSTRAIT: "OFF" ARROW_WITH_OPENTELEMETRY: "OFF" + PYARROW_TEST_ANNOTATIONS: "ON" SETUPTOOLS_SCM_PRETEND_VERSION: volumes: *ubuntu-volumes deploy: *cuda-deploy @@ -1008,7 +1011,8 @@ services: /bin/bash -c " /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow" + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python" debian-python: # Usage: @@ -1500,6 +1504,7 @@ services: python: ${PYTHON} shm_size: *shm-size environment: + PYARROW_TEST_ANNOTATIONS: "ON" <<: [*common, *ccache, *sccache] PARQUET_REQUIRE_ENCRYPTION: # inherit HYPOTHESIS_PROFILE: # inherit @@ -1510,7 +1515,8 @@ services: /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && mamba uninstall -y numpy && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-docs: # Usage: @@ -1530,13 +1536,15 @@ services: BUILD_DOCS_CPP: "ON" BUILD_DOCS_PYTHON: "ON" PYTEST_ARGS: "--doctest-modules --doctest-cython" + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: ["/arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && pip install -e /arrow/dev/archery[numpydoc] && archery numpydoc --allow-rule GL10,PR01,PR03,PR04,PR05,PR10,RT03,YD01 && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-dask: # Possible $DASK parameters: diff --git c/docs/source/developers/python/development.rst i/docs/source/developers/python/development.rst index d03b243..c23891e 100644 --- c/docs/source/developers/python/development.rst +++ i/docs/source/developers/python/development.rst @@ -42,7 +42,7 @@ Unit Testing ============ We are using `pytest <https://docs.pytest.org/en/latest/>`_ to develop our unit -test suite. After `building the project <build_pyarrow>`_ you can run its unit tests +test suite. After `building the project <building.html>`_ you can run its unit tests like so: .. code-block:: @@ -101,6 +101,74 @@ The test groups currently include: * ``s3``: Tests for Amazon S3 * ``tensorflow``: Tests that involve TensorFlow +Type Checking +============= + +PyArrow provides type stubs (``*.pyi`` files) for static type checking. These +stubs are located in the ``pyarrow-stubs/`` directory and are automatically +included in the distributed wheel packages. + +Running Type Checkers +--------------------- + +We support multiple type checkers. Their configurations are in +``pyproject.toml``. + +**mypy** + +To run mypy on the PyArrow codebase: + +.. code-block:: + + $ cd arrow/python + $ mypy + +The mypy configuration is in the ``[tool.mypy]`` section of ``pyproject.toml``. + +**pyright** + +To run pyright: + +.. code-block:: + + $ cd arrow/python + $ pyright + +The pyright configuration is in the ``[tool.pyright]`` section of ``pyproject.toml``. + +**ty** + +To run ty (note: currently only partially configured): + +.. code-block:: + + $ cd arrow/python + $ ty check + +Maintaining Type Stubs +----------------------- + +Type stubs for PyArrow are maintained in the ``pyarrow-stubs/`` +directory. These stubs mirror the structure of the main ``pyarrow/`` package. + +When adding or modifying public APIs: + +1. **Update the corresponding ``.pyi`` stub file** in ``pyarrow-stubs/`` + to reflect the new or changed function/class signatures. + +2. **Include type annotations** where possible. For Cython modules or + dynamically generated APIs such as compute kernels add the corresponding + stub in ``pyarrow-stubs/``. + +3. **Run type checkers** to ensure the stubs are correct and complete. + +The stub files are automatically copied into the built wheel during the build +process and will be included when users install PyArrow, enabling type checking +in downstream projects and for users' IDEs. + +Note: ``py.typed`` marker file in the ``pyarrow/`` directory indicates to type +checkers that PyArrow supports type checking according to :pep:`561`. + Doctest ======= diff --git c/python/MANIFEST.in i/python/MANIFEST.in index ed7012e..2840ba7 100644 --- c/python/MANIFEST.in +++ i/python/MANIFEST.in @@ -4,6 +4,7 @@ include ../NOTICE.txt global-include CMakeLists.txt graft pyarrow +graft pyarrow-stubs graft cmake_modules global-exclude *.so diff --git c/python/pyarrow-stubs/pyarrow/__init__.pyi i/python/pyarrow-stubs/pyarrow/__init__.pyi new file mode 100644 index 0000000000..2a68a51 --- /dev/null +++ i/python/pyarrow-stubs/pyarrow/__init__.pyi @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Type stubs for PyArrow. + +This is a placeholder stub file. +Complete type annotations will be added in subsequent PRs. +""" + +from typing import Any + +def __getattr__(name: str) -> Any: ... diff --git c/python/pyarrow/py.typed i/python/pyarrow/py.typed new file mode 100644 index 0000000000..13a8339 --- /dev/null +++ i/python/pyarrow/py.typed @@ -0,0 +1,16 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. diff --git c/python/pyproject.toml i/python/pyproject.toml index 899144d..9f62f02 100644 --- c/python/pyproject.toml +++ i/python/pyproject.toml @@ -84,11 +84,11 @@ zip-safe=false include-package-data=true [tool.setuptools.packages.find] -include = ["pyarrow"] +include = ["pyarrow", "pyarrow.*"] namespaces = false [tool.setuptools.package-data] -pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd"] +pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd", "py.typed"] [tool.setuptools_scm] root = '..' @@ -96,3 +96,39 @@ version_file = 'pyarrow/_generated_version.py' version_scheme = 'guess-next-dev' git_describe_command = 'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"' fallback_version = '24.0.0a0' + +# TODO: Enable type checking once stubs are merged +[tool.mypy] +files = ["pyarrow-stubs"] +mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs" +exclude = [ + "^pyarrow/", + "^benchmarks/", + "^examples/", + "^scripts/", +] + +# TODO: Enable type checking once stubs are merged +[tool.pyright] +pythonPlatform = "All" +pythonVersion = "3.10" +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", + "build", +] +stubPath = "pyarrow-stubs" +typeCheckingMode = "basic" + +# TODO: Enable type checking once stubs are merged +[tool.ty.src] +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", +] diff --git c/python/setup.py i/python/setup.py index a27bd3b..a25d2d7 100755 --- c/python/setup.py +++ i/python/setup.py @@ -121,8 +121,35 @@ class build_ext(_build_ext): def run(self): self._run_cmake() + self._copy_stubs() _build_ext.run(self) + def _copy_stubs(self): + """Copy .pyi stub files from pyarrow-stubs to the build directory.""" + build_cmd = self.get_finalized_command('build') + build_lib = os.path.abspath(build_cmd.build_lib) + + stubs_src = pjoin(setup_dir, 'pyarrow-stubs', 'pyarrow') + stubs_dest = pjoin(build_lib, 'pyarrow') + + if os.path.exists(stubs_src): + print(f"-- Copying stub files from {stubs_src} to {stubs_dest}") + for root, dirs, files in os.walk(stubs_src): + # Calculate relative path from stubs_src + rel_dir = os.path.relpath(root, stubs_src) + dest_dir = pjoin(stubs_dest, rel_dir) if rel_dir != '.' else stubs_dest + + # Create destination directory if needed + if not os.path.exists(dest_dir): + os.makedirs(dest_dir) + + # Copy .pyi files + for file in files: + if file.endswith('.pyi'): + src_file = pjoin(root, file) + dest_file = pjoin(dest_dir, file) + shutil.copy2(src_file, dest_file) + # adapted from cmake_build_ext in dynd-python # github.com/libdynd/dynd-python
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
change bat lint add a popd and nicer logging for windows ReplaceElipsis -> DocstringInserter simplify remove sphinx
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
…d script for including docstrings into stubfiles before building wheels. diff --git c/.github/workflows/python.yml i/.github/workflows/python.yml index e5d3679..4ca0f9b 100644 --- c/.github/workflows/python.yml +++ i/.github/workflows/python.yml @@ -239,6 +239,11 @@ jobs: - name: Test shell: bash run: ci/scripts/python_test.sh $(pwd) $(pwd)/build + - name: Test annotations + shell: bash + env: + PYARROW_TEST_ANNOTATIONS: "ON" + run: ci/scripts/python_test_type_annotations.sh $(pwd)/python windows: name: AMD64 Windows 2022 Python 3.13 @@ -296,3 +301,7 @@ jobs: shell: cmd run: | call "ci\scripts\python_test.bat" %cd% + - name: Test annotations + shell: cmd + run: | + call "ci\scripts\python_test_type_annotations.bat" %cd%\python diff --git c/ci/scripts/python_test_type_annotations.bat i/ci/scripts/python_test_type_annotations.bat new file mode 100644 index 0000000000..3446e32 --- /dev/null +++ i/ci/scripts/python_test_type_annotations.bat @@ -0,0 +1,38 @@ +@Rem Licensed to the Apache Software Foundation (ASF) under one +@Rem or more contributor license agreements. See the NOTICE file +@Rem distributed with this work for additional information +@Rem regarding copyright ownership. The ASF licenses this file +@Rem to you under the Apache License, Version 2.0 (the +@Rem "License"); you may not use this file except in compliance +@Rem with the License. You may obtain a copy of the License at +@Rem +@Rem http://www.apache.org/licenses/LICENSE-2.0 +@Rem +@Rem Unless required by applicable law or agreed to in writing, +@Rem software distributed under the License is distributed on an +@Rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +@Rem KIND, either express or implied. See the License for the +@Rem specific language governing permissions and limitations +@Rem under the License. + +@echo on + +set PYARROW_DIR=%1 + +echo Annotation testing on Windows ... + +@Rem Install library stubs +%PYTHON_CMD% -m pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil || exit /B 1 + +@Rem Install other dependencies for type checking +%PYTHON_CMD% -m pip install fsspec || exit /B 1 + +@Rem Install type checkers +%PYTHON_CMD% -m pip install mypy pyright ty || exit /B 1 + +@Rem Run type checkers +pushd %PYARROW_DIR% + +mypy +pyright +ty check diff --git c/ci/scripts/python_test_type_annotations.sh i/ci/scripts/python_test_type_annotations.sh new file mode 100755 index 0000000000..82610ce --- /dev/null +++ i/ci/scripts/python_test_type_annotations.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -ex +pyarrow_dir=${1} + +if [ "${PYARROW_TEST_ANNOTATIONS}" == "ON" ]; then + # Install library stubs + pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil + + # Install type checkers + pip install mypy pyright ty + + # Install other dependencies for type checking + pip install fsspec + + # Run type checkers + pushd ${pyarrow_dir} + mypy + pyright + ty check; +else + echo "Skipping type annotation tests"; +fi diff --git c/ci/scripts/python_wheel_macos_build.sh i/ci/scripts/python_wheel_macos_build.sh index bd61154..b64eee6 100755 --- c/ci/scripts/python_wheel_macos_build.sh +++ i/ci/scripts/python_wheel_macos_build.sh @@ -177,6 +177,11 @@ export CMAKE_PREFIX_PATH=${build_dir}/install export SETUPTOOLS_SCM_PRETEND_VERSION=${PYARROW_VERSION} pushd ${source_dir}/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel popd diff --git c/ci/scripts/python_wheel_validate_contents.py i/ci/scripts/python_wheel_validate_contents.py index 84fcaba..ee4a31a 100644 --- c/ci/scripts/python_wheel_validate_contents.py +++ i/ci/scripts/python_wheel_validate_contents.py @@ -35,6 +35,11 @@ def validate_wheel(path): assert not outliers, f"Unexpected contents in wheel: {sorted(outliers)}" print(f"The wheel: {wheels[0]} seems valid.") + candidates = [info for info in f.filelist if info.filename.endswith('compute.pyi')] + assert candidates, "compute.pyi not found in wheel" + content = f.read(candidates[0]).decode('utf-8', errors='replace') + assert '"""' in content, "compute.pyi missing docstrings (no triple quotes found)" + def main(): parser = argparse.ArgumentParser() diff --git c/ci/scripts/python_wheel_windows_build.bat i/ci/scripts/python_wheel_windows_build.bat index b4b7fed..3da7f60 100644 --- c/ci/scripts/python_wheel_windows_build.bat +++ i/ci/scripts/python_wheel_windows_build.bat @@ -135,6 +135,11 @@ pushd C:\arrow\python @Rem Build wheel %PYTHON_CMD% setup.py bdist_wheel || exit /B 1 +@Rem We first populate stub docstrings and then build the wheel +%PYTHON_CMD% setup.py build_ext --inplace +%PYTHON_CMD% -m pip install griffe libcst +%PYTHON_CMD% ..\dev\update_stub_docstrings.py pyarrow-stubs + @Rem Repair the wheel with delvewheel @Rem @Rem Since we bundled the Arrow C++ libraries ourselves, we only need to diff --git c/ci/scripts/python_wheel_xlinux_build.sh i/ci/scripts/python_wheel_xlinux_build.sh index a3fbeb3..977ef64 100755 --- c/ci/scripts/python_wheel_xlinux_build.sh +++ i/ci/scripts/python_wheel_xlinux_build.sh @@ -167,6 +167,11 @@ export ARROW_HOME=/tmp/arrow-dist export CMAKE_PREFIX_PATH=/tmp/arrow-dist pushd /arrow/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel echo "=== Strip symbols from wheel ===" diff --git c/compose.yaml i/compose.yaml index 2bd38a3..ae0a1d4 100644 --- c/compose.yaml +++ i/compose.yaml @@ -919,12 +919,14 @@ services: environment: <<: [*common, *ccache, *sccache] PYTEST_ARGS: # inherit + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: &python-conda-command [" /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-emscripten: # Usage: @@ -1001,6 +1003,7 @@ services: ARROW_S3: "OFF" ARROW_SUBSTRAIT: "OFF" ARROW_WITH_OPENTELEMETRY: "OFF" + PYARROW_TEST_ANNOTATIONS: "ON" SETUPTOOLS_SCM_PRETEND_VERSION: volumes: *ubuntu-volumes deploy: *cuda-deploy @@ -1008,7 +1011,8 @@ services: /bin/bash -c " /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow" + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python" debian-python: # Usage: @@ -1500,6 +1504,7 @@ services: python: ${PYTHON} shm_size: *shm-size environment: + PYARROW_TEST_ANNOTATIONS: "ON" <<: [*common, *ccache, *sccache] PARQUET_REQUIRE_ENCRYPTION: # inherit HYPOTHESIS_PROFILE: # inherit @@ -1510,7 +1515,8 @@ services: /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && mamba uninstall -y numpy && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-docs: # Usage: @@ -1530,13 +1536,15 @@ services: BUILD_DOCS_CPP: "ON" BUILD_DOCS_PYTHON: "ON" PYTEST_ARGS: "--doctest-modules --doctest-cython" + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: ["/arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && pip install -e /arrow/dev/archery[numpydoc] && archery numpydoc --allow-rule GL10,PR01,PR03,PR04,PR05,PR10,RT03,YD01 && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-dask: # Possible $DASK parameters: diff --git c/docs/source/developers/python/development.rst i/docs/source/developers/python/development.rst index d03b243..c23891e 100644 --- c/docs/source/developers/python/development.rst +++ i/docs/source/developers/python/development.rst @@ -42,7 +42,7 @@ Unit Testing ============ We are using `pytest <https://docs.pytest.org/en/latest/>`_ to develop our unit -test suite. After `building the project <build_pyarrow>`_ you can run its unit tests +test suite. After `building the project <building.html>`_ you can run its unit tests like so: .. code-block:: @@ -101,6 +101,74 @@ The test groups currently include: * ``s3``: Tests for Amazon S3 * ``tensorflow``: Tests that involve TensorFlow +Type Checking +============= + +PyArrow provides type stubs (``*.pyi`` files) for static type checking. These +stubs are located in the ``pyarrow-stubs/`` directory and are automatically +included in the distributed wheel packages. + +Running Type Checkers +--------------------- + +We support multiple type checkers. Their configurations are in +``pyproject.toml``. + +**mypy** + +To run mypy on the PyArrow codebase: + +.. code-block:: + + $ cd arrow/python + $ mypy + +The mypy configuration is in the ``[tool.mypy]`` section of ``pyproject.toml``. + +**pyright** + +To run pyright: + +.. code-block:: + + $ cd arrow/python + $ pyright + +The pyright configuration is in the ``[tool.pyright]`` section of ``pyproject.toml``. + +**ty** + +To run ty (note: currently only partially configured): + +.. code-block:: + + $ cd arrow/python + $ ty check + +Maintaining Type Stubs +----------------------- + +Type stubs for PyArrow are maintained in the ``pyarrow-stubs/`` +directory. These stubs mirror the structure of the main ``pyarrow/`` package. + +When adding or modifying public APIs: + +1. **Update the corresponding ``.pyi`` stub file** in ``pyarrow-stubs/`` + to reflect the new or changed function/class signatures. + +2. **Include type annotations** where possible. For Cython modules or + dynamically generated APIs such as compute kernels add the corresponding + stub in ``pyarrow-stubs/``. + +3. **Run type checkers** to ensure the stubs are correct and complete. + +The stub files are automatically copied into the built wheel during the build +process and will be included when users install PyArrow, enabling type checking +in downstream projects and for users' IDEs. + +Note: ``py.typed`` marker file in the ``pyarrow/`` directory indicates to type +checkers that PyArrow supports type checking according to :pep:`561`. + Doctest ======= diff --git c/python/MANIFEST.in i/python/MANIFEST.in index ed7012e..2840ba7 100644 --- c/python/MANIFEST.in +++ i/python/MANIFEST.in @@ -4,6 +4,7 @@ include ../NOTICE.txt global-include CMakeLists.txt graft pyarrow +graft pyarrow-stubs graft cmake_modules global-exclude *.so diff --git c/python/pyarrow-stubs/pyarrow/__init__.pyi i/python/pyarrow-stubs/pyarrow/__init__.pyi new file mode 100644 index 0000000000..2a68a51 --- /dev/null +++ i/python/pyarrow-stubs/pyarrow/__init__.pyi @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Type stubs for PyArrow. + +This is a placeholder stub file. +Complete type annotations will be added in subsequent PRs. +""" + +from typing import Any + +def __getattr__(name: str) -> Any: ... diff --git c/python/pyarrow/py.typed i/python/pyarrow/py.typed new file mode 100644 index 0000000000..13a8339 --- /dev/null +++ i/python/pyarrow/py.typed @@ -0,0 +1,16 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. diff --git c/python/pyproject.toml i/python/pyproject.toml index 899144d..9f62f02 100644 --- c/python/pyproject.toml +++ i/python/pyproject.toml @@ -84,11 +84,11 @@ zip-safe=false include-package-data=true [tool.setuptools.packages.find] -include = ["pyarrow"] +include = ["pyarrow", "pyarrow.*"] namespaces = false [tool.setuptools.package-data] -pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd"] +pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd", "py.typed"] [tool.setuptools_scm] root = '..' @@ -96,3 +96,39 @@ version_file = 'pyarrow/_generated_version.py' version_scheme = 'guess-next-dev' git_describe_command = 'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"' fallback_version = '24.0.0a0' + +# TODO: Enable type checking once stubs are merged +[tool.mypy] +files = ["pyarrow-stubs"] +mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs" +exclude = [ + "^pyarrow/", + "^benchmarks/", + "^examples/", + "^scripts/", +] + +# TODO: Enable type checking once stubs are merged +[tool.pyright] +pythonPlatform = "All" +pythonVersion = "3.10" +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", + "build", +] +stubPath = "pyarrow-stubs" +typeCheckingMode = "basic" + +# TODO: Enable type checking once stubs are merged +[tool.ty.src] +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", +] diff --git c/python/setup.py i/python/setup.py index a27bd3b..a25d2d7 100755 --- c/python/setup.py +++ i/python/setup.py @@ -121,8 +121,35 @@ class build_ext(_build_ext): def run(self): self._run_cmake() + self._copy_stubs() _build_ext.run(self) + def _copy_stubs(self): + """Copy .pyi stub files from pyarrow-stubs to the build directory.""" + build_cmd = self.get_finalized_command('build') + build_lib = os.path.abspath(build_cmd.build_lib) + + stubs_src = pjoin(setup_dir, 'pyarrow-stubs', 'pyarrow') + stubs_dest = pjoin(build_lib, 'pyarrow') + + if os.path.exists(stubs_src): + print(f"-- Copying stub files from {stubs_src} to {stubs_dest}") + for root, dirs, files in os.walk(stubs_src): + # Calculate relative path from stubs_src + rel_dir = os.path.relpath(root, stubs_src) + dest_dir = pjoin(stubs_dest, rel_dir) if rel_dir != '.' else stubs_dest + + # Create destination directory if needed + if not os.path.exists(dest_dir): + os.makedirs(dest_dir) + + # Copy .pyi files + for file in files: + if file.endswith('.pyi'): + src_file = pjoin(root, file) + dest_file = pjoin(dest_dir, file) + shutil.copy2(src_file, dest_file) + # adapted from cmake_build_ext in dynd-python # github.com/libdynd/dynd-python
12b6d99 to
4daf6ed
Compare
4daf6ed to
7873930
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
This is the second in series of PRs adding type annotations to pyarrow and resolving #32609. It builds on top of and should be merged after #48618.
What changes are included in this PR?
This adds:
_types.pyi- Core type definitions including_stubs_typing.pyi- Internal typing protocols and helpers used across stub fileserror.pyi- Exception classes (ArrowException,ArrowInvalid,ArrowIOError, etc.)lib.pyi,io.pyi,scalar.pyi- using__getattr__to allow imports to resolve while deferring to subsequent PRsAre these changes tested?
Via CI type checks established in #48618.
Are there any user-facing changes?
Users will start seeing some minimal annotated types.